Overview

Dataset statistics

Number of variables11
Number of observations1623
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory139.6 KiB
Average record size in memory88.1 B

Variable types

NUM8
CAT3

Reproduction

Analysis started2020-07-12 14:10:36.867325
Analysis finished2020-07-12 14:10:48.920859
Duration12.05 seconds
Versionpandas-profiling v2.8.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml

Warnings

loc.details has a high cardinality: 180 distinct values High cardinality
location has a high cardinality: 693 distinct values High cardinality
deposit_amount_2012 is highly correlated with deposit_amount_2011 and 4 other fieldsHigh correlation
deposit_amount_2011 is highly correlated with deposit_amount_2012 and 4 other fieldsHigh correlation
deposit_amount_2013 is highly correlated with deposit_amount_2011 and 4 other fieldsHigh correlation
deposit_amount_2014 is highly correlated with deposit_amount_2011 and 4 other fieldsHigh correlation
deposit_amount_2015 is highly correlated with deposit_amount_2011 and 4 other fieldsHigh correlation
deposit_amount_2016 is highly correlated with deposit_amount_2011 and 4 other fieldsHigh correlation
deposit_amount_2011 is highly skewed (γ1 = 39.81107704) Skewed
deposit_amount_2012 is highly skewed (γ1 = 39.64549677) Skewed
deposit_amount_2013 is highly skewed (γ1 = 39.49573447) Skewed
deposit_amount_2014 is highly skewed (γ1 = 39.49528758) Skewed
deposit_amount_2015 is highly skewed (γ1 = 39.28702403) Skewed
deposit_amount_2016 is highly skewed (γ1 = 39.62477434) Skewed
id has unique values Unique
deposit_amount_2011 has 91 (5.6%) zeros Zeros
deposit_amount_2012 has 92 (5.7%) zeros Zeros
deposit_amount_2013 has 92 (5.7%) zeros Zeros
deposit_amount_2014 has 92 (5.7%) zeros Zeros
deposit_amount_2015 has 92 (5.7%) zeros Zeros
deposit_amount_2016 has 92 (5.7%) zeros Zeros

Variables

id
Real number (ℝ≥0)

UNIQUE

Distinct count1623
Unique (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean812.0
Minimum1
Maximum1623
Zeros0
Zeros (%)0.0%
Memory size12.7 KiB

Quantile statistics

Minimum1
5-th percentile82.1
Q1406.5
median812
Q31217.5
95-th percentile1541.9
Maximum1623
Range1622
Interquartile range (IQR)811

Descriptive statistics

Standard deviation468.6640588
Coefficient of variation (CV)0.5771724862
Kurtosis-1.2
Mean812
Median Absolute Deviation (MAD)406
Skewness0
Sum1317876
Variance219646
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
162310.1%
 
108810.1%
 
106810.1%
 
107010.1%
 
107210.1%
 
107410.1%
 
107610.1%
 
107810.1%
 
108010.1%
 
108210.1%
 
Other values (1613)161399.4%
 
ValueCountFrequency (%) 
110.1%
 
210.1%
 
310.1%
 
410.1%
 
510.1%
 
ValueCountFrequency (%) 
162310.1%
 
162210.1%
 
162110.1%
 
162010.1%
 
161910.1%
 

deposit_amount_2011
Real number (ℝ≥0)

HIGH CORRELATION
SKEWED
ZEROS

Distinct count1520
Unique (%)93.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean836240.7791127542
Minimum0.0
Maximum949696500.0
Zeros91
Zeros (%)5.6%
Memory size12.7 KiB

Quantile statistics

Minimum0
5-th percentile0
Q150784.75
median98578.5
Q3181356.75
95-th percentile461689.95
Maximum949696500
Range949696500
Interquartile range (IQR)130572

Descriptive statistics

Standard deviation23664391.51
Coefficient of variation (CV)28.29853805
Kurtosis1596.53699
Mean836240.7791
Median Absolute Deviation (MAD)56863.5
Skewness39.81107704
Sum1357218784
Variance5.600034254e+14
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0915.6%
 
96172.520.1%
 
6154820.1%
 
3150020.1%
 
181585.520.1%
 
8140520.1%
 
126115.520.1%
 
67957.520.1%
 
41809.520.1%
 
154339.520.1%
 
Other values (1510)151493.3%
 
ValueCountFrequency (%) 
0915.6%
 
1.510.1%
 
247.510.1%
 
2107.510.1%
 
5209.510.1%
 
ValueCountFrequency (%) 
94969650010.1%
 
6764861410.1%
 
3953458210.1%
 
3315162910.1%
 
1043003410.1%
 

deposit_amount_2012
Real number (ℝ≥0)

HIGH CORRELATION
SKEWED
ZEROS

Distinct count1527
Unique (%)94.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean984752.9593345657
Minimum0.0
Maximum1114902000.0
Zeros92
Zeros (%)5.7%
Memory size12.7 KiB

Quantile statistics

Minimum0
5-th percentile0
Q153303.25
median103470
Q3193895.25
95-th percentile481739.85
Maximum1114902000
Range1114902000
Interquartile range (IQR)140592

Descriptive statistics

Standard deviation27822195.32
Coefficient of variation (CV)28.25296949
Kurtosis1587.10057
Mean984752.9593
Median Absolute Deviation (MAD)60190.5
Skewness39.64549677
Sum1598254053
Variance7.740745524e+14
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0925.7%
 
27235.520.1%
 
7024820.1%
 
4824920.1%
 
6663920.1%
 
6813620.1%
 
5929510.1%
 
3218710.1%
 
488785.510.1%
 
202921.510.1%
 
Other values (1517)151793.5%
 
ValueCountFrequency (%) 
0925.7%
 
23710.1%
 
2020.510.1%
 
3277.510.1%
 
4963.510.1%
 
ValueCountFrequency (%) 
111490200010.1%
 
9394647610.1%
 
5548703710.1%
 
4128180610.1%
 
15721390.510.1%
 

deposit_amount_2013
Real number (ℝ≥0)

HIGH CORRELATION
SKEWED
ZEROS

Distinct count1524
Unique (%)93.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1107470.495378928
Minimum0.0
Maximum1248682500.0
Zeros92
Zeros (%)5.7%
Memory size12.7 KiB

Quantile statistics

Minimum0
5-th percentile0
Q157223.5
median112302
Q3207204
95-th percentile528510.3
Maximum1248682500
Range1248682500
Interquartile range (IQR)149980.5

Descriptive statistics

Standard deviation31203622.44
Coefficient of variation (CV)28.17557901
Kurtosis1578.406914
Mean1107470.495
Median Absolute Deviation (MAD)64885.5
Skewness39.49573447
Sum1797424614
Variance9.736660537e+14
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0925.7%
 
173326.520.1%
 
137647.520.1%
 
12735020.1%
 
34819.520.1%
 
4668920.1%
 
136798.520.1%
 
5369720.1%
 
77494.520.1%
 
10862710.1%
 
Other values (1514)151493.3%
 
ValueCountFrequency (%) 
0925.7%
 
24310.1%
 
192010.1%
 
413110.1%
 
5425.510.1%
 
ValueCountFrequency (%) 
124868250010.1%
 
122940568.510.1%
 
5819219110.1%
 
54329896.510.1%
 
18817225.510.1%
 

deposit_amount_2014
Real number (ℝ≥0)

HIGH CORRELATION
SKEWED
ZEROS

Distinct count1529
Unique (%)94.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1220472.520332717
Minimum0.0
Maximum1374814500.0
Zeros92
Zeros (%)5.7%
Memory size12.7 KiB

Quantile statistics

Minimum0
5-th percentile0
Q161818.75
median120445.5
Q3226321.5
95-th percentile578735.85
Maximum1374814500
Range1374814500
Interquartile range (IQR)164502.75

Descriptive statistics

Standard deviation34354845.66
Coefficient of variation (CV)28.14880719
Kurtosis1578.512689
Mean1220472.52
Median Absolute Deviation (MAD)71122.5
Skewness39.49528758
Sum1980826900
Variance1.18025542e+15
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0925.7%
 
2711720.1%
 
13697420.1%
 
76243.520.1%
 
253951.510.1%
 
5342110.1%
 
55195.510.1%
 
163495.510.1%
 
172339.510.1%
 
10588810.1%
 
Other values (1519)151993.6%
 
ValueCountFrequency (%) 
0925.7%
 
208.510.1%
 
3394.510.1%
 
4162.510.1%
 
4657.510.1%
 
ValueCountFrequency (%) 
137481450010.1%
 
127766710.510.1%
 
78427267.510.1%
 
5872897510.1%
 
21045706.510.1%
 

deposit_amount_2015
Real number (ℝ≥0)

HIGH CORRELATION
SKEWED
ZEROS

Distinct count1525
Unique (%)94.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1388776.3086876154
Minimum0.0
Maximum1548823500.0
Zeros92
Zeros (%)5.7%
Memory size12.7 KiB

Quantile statistics

Minimum0
5-th percentile0
Q164618.5
median127141.5
Q3238721.25
95-th percentile614772.45
Maximum1548823500
Range1548823500
Interquartile range (IQR)174102.75

Descriptive statistics

Standard deviation38776096.5
Coefficient of variation (CV)27.9210527
Kurtosis1566.640917
Mean1388776.309
Median Absolute Deviation (MAD)75238.5
Skewness39.28702403
Sum2253983949
Variance1.50358566e+15
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0925.7%
 
95983.520.1%
 
6768620.1%
 
7312220.1%
 
65986.520.1%
 
216211.520.1%
 
33680420.1%
 
127141.520.1%
 
132466.510.1%
 
17937310.1%
 
Other values (1515)151593.3%
 
ValueCountFrequency (%) 
0925.7%
 
199.510.1%
 
3805.510.1%
 
402310.1%
 
5275.510.1%
 
ValueCountFrequency (%) 
154882350010.1%
 
14748324310.1%
 
12361235410.1%
 
7065566110.1%
 
27182524.510.1%
 

deposit_amount_2016
Real number (ℝ≥0)

HIGH CORRELATION
SKEWED
ZEROS

Distinct count1526
Unique (%)94.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1412397.887245841
Minimum0.0
Maximum1604137500.0
Zeros92
Zeros (%)5.7%
Memory size12.7 KiB

Quantile statistics

Minimum0
5-th percentile0
Q169571.5
median134331
Q3258485.25
95-th percentile648135.45
Maximum1604137500
Range1604137500
Interquartile range (IQR)188913.75

Descriptive statistics

Standard deviation40037728.32
Coefficient of variation (CV)28.34734368
Kurtosis1586.037873
Mean1412397.887
Median Absolute Deviation (MAD)79812
Skewness39.62477434
Sum2292321771
Variance1.603019689e+15
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0925.7%
 
10044020.1%
 
8907620.1%
 
105454.520.1%
 
21832.520.1%
 
16002920.1%
 
6816320.1%
 
205990.510.1%
 
70015.510.1%
 
125809.510.1%
 
Other values (1516)151693.4%
 
ValueCountFrequency (%) 
0925.7%
 
17710.1%
 
304210.1%
 
3943.510.1%
 
528310.1%
 
ValueCountFrequency (%) 
160413750010.1%
 
12685355710.1%
 
9105325510.1%
 
64990429.510.1%
 
28283470.510.1%
 

loc.details
Categorical

HIGH CARDINALITY

Distinct count180
Unique (%)11.1%
Missing0
Missing (%)0.0%
Memory size12.7 KiB
Maricopa
 
84
Harris
 
77
New York
 
76
Cook
 
65
Dallas
 
60
Other values (175)
1261
ValueCountFrequency (%) 
Maricopa845.2%
 
Harris774.7%
 
New York764.7%
 
Cook654.0%
 
Dallas603.7%
 
Wayne583.6%
 
Queens412.5%
 
Kings382.3%
 
Franklin372.3%
 
Oakland362.2%
 
Other values (170)105164.8%
 

Length

Max length20
Median length7
Mean length6.964879852
Min length3

location
Categorical

HIGH CARDINALITY

Distinct count693
Unique (%)42.7%
Missing0
Missing (%)0.0%
Memory size12.7 KiB
New York City
 
78
Houston
 
67
Brooklyn
 
37
Dallas
 
33
Phoenix
 
31
Other values (688)
1377
ValueCountFrequency (%) 
New York City784.8%
 
Houston674.1%
 
Brooklyn372.3%
 
Dallas332.0%
 
Phoenix311.9%
 
Bronx311.9%
 
Tucson271.7%
 
Columbus251.5%
 
Baton Rouge241.5%
 
Austin221.4%
 
Other values (683)124876.9%
 

Length

Max length19
Median length8
Mean length8.854590265
Min length3

state
Categorical

Distinct count14
Unique (%)0.9%
Missing0
Missing (%)0.0%
Memory size12.7 KiB
NY
339
TX
282
OH
252
MI
208
AZ
144
Other values (9)
398
ValueCountFrequency (%) 
NY33920.9%
 
TX28217.4%
 
OH25215.5%
 
MI20812.8%
 
AZ1448.9%
 
LA1408.6%
 
IL1197.3%
 
NJ392.4%
 
CT281.7%
 
WV261.6%
 
Other values (4)462.8%
 

Length

Max length2
Median length2
Mean length2
Min length2

age_of_bank
Real number (ℝ≥0)

Distinct count129
Unique (%)7.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean139.8706099815157
Minimum12
Maximum216
Zeros0
Zeros (%)0.0%
Memory size12.7 KiB

Quantile statistics

Minimum12
5-th percentile22
Q152
median216
Q3216
95-th percentile216
Maximum216
Range204
Interquartile range (IQR)164

Descriptive statistics

Standard deviation82.57293913
Coefficient of variation (CV)0.5903523202
Kurtosis-1.778114014
Mean139.87061
Median Absolute Deviation (MAD)0
Skewness-0.2670285328
Sum227010
Variance6818.290276
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
21684251.9%
 
19281.7%
 
43241.5%
 
56211.3%
 
54201.2%
 
37201.2%
 
53191.2%
 
42181.1%
 
36171.0%
 
58171.0%
 
Other values (119)59736.8%
 
ValueCountFrequency (%) 
1240.2%
 
1340.2%
 
1430.2%
 
1570.4%
 
1640.2%
 
ValueCountFrequency (%) 
21684251.9%
 
20520.1%
 
20220.1%
 
19320.1%
 
17720.1%
 

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

Sample

First rows

iddeposit_amount_2011deposit_amount_2012deposit_amount_2013deposit_amount_2014deposit_amount_2015deposit_amount_2016loc.detailslocationstateage_of_bank
01949696500.01.114902e+091.248682e+091.374814e+091.548824e+091.604138e+09DelawareColumbusOH193
12439843.54.661865e+054.886130e+054.918950e+054.916880e+055.122125e+05WestchesterScarsdaleNY216
23286516.53.103995e+053.246585e+053.569745e+053.512745e+053.936825e+05NassauGreat NeckNY53
34130665.01.325505e+051.397445e+051.644885e+051.679775e+051.751580e+05WestchesterHartsdaleNY216
45258912.02.591235e+052.841195e+052.976675e+053.077970e+053.348000e+05NassauLawrenceNY216
56220230.02.050080e+052.110170e+052.314695e+052.230605e+052.182485e+05WestchesterMount VernonNY216
67112696.51.202580e+051.234995e+051.418070e+051.455690e+051.607490e+05BronxBronxNY51
7859832.06.381900e+046.570000e+046.880050e+047.704450e+048.503950e+04BronxBronxNY216
89110553.01.050735e+051.056705e+051.184190e+051.210155e+051.241340e+05BronxBronxNY216
910104667.01.092240e+051.120935e+051.132050e+051.181385e+051.259670e+05BronxBronxNY216

Last rows

iddeposit_amount_2011deposit_amount_2012deposit_amount_2013deposit_amount_2014deposit_amount_2015deposit_amount_2016loc.detailslocationstateage_of_bank
16131614196357.5212155.5227674.5282112.5198720.0212781.0MilwaukeeFox PointWI216
1614161530301.533112.538347.539847.543236.043119.0MilwaukeeMilwaukeeWI95
1615161656086.558680.062710.571485.573122.076455.0MilwaukeeMilwaukeeWI216
161616170.00.00.00.00.00.0MilwaukeeMilwaukeeWI216
1617161853412.055384.561980.062097.063099.067599.0MilwaukeeMilwaukeeWI104
16181619103951.5133564.5138643.5150294.0159280.5152766.0MilwaukeeCudahyWI107
1619162098406.0105657.0114579.0124258.5139989.0150336.0MilwaukeeWauwatosaWI216
1620162183460.086874.098116.5124689.0126501.0137949.0OzaukeeMequonWI216
1621162281405.089365.598139.093705.0120355.5122323.5WaukeshaDelafieldWI216
1622162325537.525537.528282.530828.035551.535727.0WaukeshaEagleWI216